Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Product summarization extraction model with multimodal information fusion
Qiang ZHAO, Zhongqing WANG, Hongling WANG
Journal of Computer Applications    2024, 44 (1): 73-78.   DOI: 10.11772/j.issn.1001-9081.2022121910
Abstract114)   HTML4)    PDF (1183KB)(78)       Save

On online shopping platforms, concise, authentic and effective product summarizations are crucial to improving the shopping experience. In addition, online shopping cannot touch the actual product, and the information contained in the product image is important visual information except the product text description, so product summarization that fuses multimodal information including product text and product image is of great significance for online shopping. Aiming at fusing product text descriptions and product images, a product summarization extraction model with multimodal information fusion was proposed. Different from the general product summarization task whose input only contains the product text description, the proposed model introduces product image as an additional source of information to make the extracted summary richer. Specifically, first the pre-trained model was used to represent the features of the product text description and product image by which the text feature representation of each sentence was extracted from the product text description, and the overall visual feature representation of the product was extracted from the product image. Then the low-rank tensor-based multimodal fusion method was used to modally fuse the text features and overall visual features to obtain the multimodal feature representation for each sentence. Finally, the multimodal feature representations of all sentences were fed into the summary generator to generate the final product summarization. Comparative experiments were conducted on CEPSUM 2.0 (Chinese E-commerce Product SUMmarization 2.0) dataset. On the three subsets of CEPSUM 2.0, the average ROUGE-1 (Recall-Oriented Understudy for Gisting Evaluation 1) of this model is 3.12 percentage points higher than that of TextRank and 1.75 percentage points higher than that of BERTSUMExt (BERT SUMmarization Extractive). Experimental results show that the proposed model is effective in fusing product text and image information, which performs well on ROUGE evaluation index.

Table and Figures | Reference | Related Articles | Metrics